326 research outputs found

    DROW: Real-Time Deep Learning based Wheelchair Detection in 2D Range Data

    Full text link
    We introduce the DROW detector, a deep learning based detector for 2D range data. Laser scanners are lighting invariant, provide accurate range data, and typically cover a large field of view, making them interesting sensors for robotics applications. So far, research on detection in laser range data has been dominated by hand-crafted features and boosted classifiers, potentially losing performance due to suboptimal design choices. We propose a Convolutional Neural Network (CNN) based detector for this task. We show how to effectively apply CNNs for detection in 2D range data, and propose a depth preprocessing step and voting scheme that significantly improve CNN performance. We demonstrate our approach on wheelchairs and walkers, obtaining state of the art detection results. Apart from the training data, none of our design choices limits the detector to these two classes, though. We provide a ROS node for our detector and release our dataset containing 464k laser scans, out of which 24k were annotated.Comment: Lucas Beyer and Alexander Hermans contributed equall

    Towards a Principled Integration of Multi-Camera Re-Identification and Tracking through Optimal Bayes Filters

    Full text link
    With the rise of end-to-end learning through deep learning, person detectors and re-identification (ReID) models have recently become very strong. Multi-camera multi-target (MCMT) tracking has not fully gone through this transformation yet. We intend to take another step in this direction by presenting a theoretically principled way of integrating ReID with tracking formulated as an optimal Bayes filter. This conveniently side-steps the need for data-association and opens up a direct path from full images to the core of the tracker. While the results are still sub-par, we believe that this new, tight integration opens many interesting research opportunities and leads the way towards full end-to-end tracking from raw pixels.Comment: First two authors have equal contribution. This is initial work into a new direction, not a benchmark-beating method. v2 only adds acknowledgements and fixes a typo in e-mai

    Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

    Full text link
    Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration. We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers. Our shape-optimized vision transformer, SoViT, achieves results competitive with models that exceed twice its size, despite being pre-trained with an equivalent amount of compute. For example, SoViT-400m/14 achieves 90.3% fine-tuning accuracy on ILSRCV2012, surpassing the much larger ViT-g/14 and approaching ViT-G/14 under identical settings, with also less than half the inference cost. We conduct a thorough evaluation across multiple tasks, such as image classification, captioning, VQA and zero-shot transfer, demonstrating the effectiveness of our model across a broad range of domains and identifying limitations. Overall, our findings challenge the prevailing approach of blindly scaling up vision models and pave a path for a more informed scaling.Comment: 10 pages, 7 figures, 9 tables. Version 2: Layout fixe

    Industrial Infrastructure: Translocal Planning for Global Production in Ethiopia and Argentina

    Get PDF
    Current development and re-development of industrial areas cannot be adequately understood without taking into account the organisational structures and logistics of commodity production on a planetary scale. Global production networks contribute not only to the reconfiguration of urban spatial and economic structures in many places, but they also give rise to novel transnational actor constellations, thus reconfiguring planning processes. This article explores such constellations and their urban outcomes by investigating two current cases of industrial development linked with multilateral transport-infrastructure provisioning in Ethiopia and Argentina. In both cases, international partners are involved, in particular with stakeholders based in China playing significant roles. In Mekelle, Ethiopia, we focus on the establishment of a commodity hub through the implementation of new industry parks for global garment production and road and rail connections to international seaports. In the Rosario metropolitan area in Argentina, major cargo rail and port facilities are under development to expand the country’s most important ports for soybean export. By mapping the physical architectures of the industrial and infrastructure complexes and their urban contexts and tracing the translocal actor constellations involved in infrastructure provisioning and operation, we analyse the spatial impacts of the projects as well as the related implications for planning governance. The article contributes to emergent scholarship and theorisations of urban infrastructure and global production networks, as well as policy mobility and the transnational constitution of planning knowledge and practices
    • …
    corecore